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It is common, in deconvolution problems, to assume that the measurement errors are identically 
distributed. In many real-life applications, however, this condition is not satisfied and the decon- 
volution estimators developed for homoscedastic errors become inconsistent. In this paper, we 
introduce a kernel estimator of a density in the case of heteroscedastic contamination. We estab- 
lish consistency of the estimator and show that it achieves optimal rates of convergence under 
quite general conditions. We study the limits of application of the procedure in some extreme 
situations, where we show that, in some cases, our estimator is consistent, even when the scaling 
parameter of the error is unbounded. We suggest a modified estimator for the problem where 
the distribution of the errors is unknown, but replicated observations are available. Finally, 
an adaptive procedure for selecting the smoothing parameter is proposed and its finite-sample 
properties are investigated on simulated examples. 

Keywords: bandwidth; density deconvolution; errors-in- variables; heteroscedastic 
contamination; inverse problems; plug-in 

1. Introduction 

We consider nonparametric estimation of a density from a sample contaminated by ran- 
dom error. This problem, which is called a deconvolution problem, arises very frequently in 
real data applications since, in practice, one often introduces non-negligible measurement 
errors while observing the data. The fields of application are various and include astron- 
omy, biology, chemistry, economy and public health; see, for example, Merritt (1997) or 
the numerous examples described in Carroll et al. (2006). 

In the conventional case, the observations are a sample of independent and identically 
distributed (i.i.d.) variables li, . . . , y„ generated by the model 

'Y]=Xj+£j, Xj^fx and Ej^fe-, (1-1) 

where the unknown density fx of Xj is the quantity of interest, £j are the error vari- 
ables, independent of Xj, and is known. In this context, Carroll and Hall (1988) 
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and Stefanski and Carroll (1990) proposed the deconvolution kernel density estimator. 
Let if be a square-integrable kernel function, a;„ > a smoothing parameter and, for 
all t, assume /^'(t) ^ 0, where g^' denotes the Fourier transform of a function g. The 
deconvolution kernel estimator is defined by 



see, for example. Fan (1991a, b), Fan (1993) and Masry (1993) for theoretical properties. 
Recent contributions to density deconvolution include Zhang and Karunamuni (2000), 
CarroU and HaU (2004), van Es and Uh (2005), Hall and Qiu (2005) and Hall and 
Meister (2007). 

In many applications of interest, the assumption of homoscedastic errors is too re- 
strictive to be realistic. Bennett and Franklin (1954) describe an experiment where some 
students were asked to assess the iron content of substances. Here, clearly, the measure- 
ment process, and hence the error distribution, is subjective and differs among individu- 
als. In some experiments, the error distribution depends on the type of individual under 
study (e.g.. healthy or not, smoker or not, etc.) or on the measurement process. Here, as 
soon as the sample contains observations of different types, the errors are not identically 
distributed in the sample; see Fuller (1987) for an early consideration of this problem. 
Heteroscedasticity also arises when the sample is formed by collating data from differ- 
ent laboratories (see, e.g.. National Research Council (1993)) or from different studies 
(meta-analysis), or when contaminated replications available for each individual i are 
averaged to form a new sample of observations - a procedure often used in practice, 
because it reduces the scale of error. 

In Section 2, we formally introduce the heteroscedastic error model and propose a 
deconvolution kernel estimator of the density fx that accounts for heteroscedastic errors. 
We establish L2-consistcncy of the estimator, obtain its rates of convergence and prove 
that these are optimal. In Section 3, we study two important aspects of heteroscedastic 
contamination. We first consider the problem where different numbers of replicates are 
observed for each random variable Xj. We show that, in the case of normal contamination, 
averaging the replicates and then using the procedure derived in Section 2 leads to 
optimal convergence rates. Next, we discuss limiting cases of heteroscedastic errors with 
unbounded scaling parameters and give an equivalent criterion for the existence of a 
consistent estimator. Section 4 discusses some situations where the error distributions are 
unknown, but either replicated observations are available or more restrictive conditions 
on fx are assumed. We study finite-sample properties of our estimator in Section 5. 
We develop a data-driven bandwidth selector and give some numerical simulations. All 
proofs are deferred to Section 6. 




(1.2) 
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2. Estimation procedure and asymptotics 



2.1. The estimator 

We generalize model (1.1) to allow hcteroscedastic contamination, leading to the model 

Y,^X,+e,, X.^fx and e.^A,. (2.1) 

Now, each £j has its own density f^^ , which may depend on both the observation number 
j and the sample size n. In this setting, where (1.2) can no longer be used, the estimator 
we propose is defined by 

fn{x) = ^J exp(-iia;)if f*(i/w„)*„(t) dt (2.2) 

with 

n / \ 

^" (0 = E fe] i~t) I E l/'I (^)l' • 

J=l \fe=l / 

This estimator is well defined if we assume the following. 
Condition A. 

There exists some j such that j/^ {t)\ 7^ for all t gR, (A.l) 
K^^{t) is bounded, continuous at t = and K^^{0) = 1, (A. 2) 

/ft(i^„)i^ft(t)/|^El/'I(i^«)l'^ eL2(M) /orj = l,...,n. (A.3) 

These conditions are standard in deconvolution problems. In particular, in order 
to satisfy (A.3), it is rather common to choose kernels that have a compactly sup- 
ported Fourier transform K^^ . Such kernels arc supported on the whole real line, ex- 
amples being the sine kernel Ki{x) = sin2;/(7tx) and the kernel K2{x) = 48 (cos a;) (1 — 
15a;^^)/(7Ta;"') — 144(sina;)(2 — 5x~^)/(7Tx^), which have respective characteristic func- 
tions Kl*'{t) = l[-i i]{t), the indicator function of the interval [—1,1], and A'|'(i) = 
{l-t')H[_,,,^it). ' 

An alternative estimator that can perhaps be seen as a more natural generalization 
of (1.2) is the estimator obtained when using X]J=i 6xp(itYj){/ft (i)}^^ instead of 
4'„(t). A quick inspection of its properties, however, shows that this estimator suf- 
fers from the convergence rates of the least favorable error £j and is therefore not 
acceptable. Another estimator of fx, fn,2{x), can be defined if we replace ^'„(t) by 
$„(t) = X]j=i '2^P(i^X7)/(X]fe=i /el (^))- advantage, applying this estimator requires 

only knowledge of the set {f^i , /e„}, but not the information about which observation 
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is corrupted by which of the error densities. However, it is less attractive in some cases 
of non-symmetric /ej. as, then, there is no guarantee that the denominator in <&n(i) does 
not vanish, although each is assumed to have no zeros. Also, the mean integrated 

squared error of (2.2) is smaller than that of fn,2 and, therefore, for the most part, we 
will focus our consideration on (2.2). 

2.2. Asymptotic properties 

We study asymptotic properties of our estimator by examining its mean integrated 
squared error (MISE), defined by MISE„(/x) = E||/„ — ^^'^ usual bias- variance 

decomposition and the use of Parseval's identity lead to the following result. 

Lemma 2.1. Under Condition A, if fx G -^2(K), then the estimator (2.2) satisfies 
MISE„(/x) - ^1 \f'm\'\K''{t/u;r^)-lfdt 




From the above lemma, wc will be able to derive the rates of convergence of our 
estimator and prove their optimality in Tj^^cu the class of densities uniformly bounded 
relative to their Sobolev (/?-)norm, that is, that satisfy 

j\f%{t)\\l + tYAt<C. (2.4) 

Throughout, we assume (3 > 1/2, which ensures, for example, continuity of fx- We also 
assume that the kernel K satisfies the following condition, which is fulfilled by, for ex- 
ample, the sine kernel Ki (for any (3 > 1/2). 

Condition B. \K^\t)\ < 1 for all t, iff* is supported on [-1, 1] and \K^\t) - 1| = o{\t\'^) 
with (3 as in (2.4)- 

Finally, we need some regularity assumptions on the error densities . : we assume the 
existence of a,C > and the existence of some positive monotone decreasing functions 
Tpj „(t) and ip . ^^(t) for t>0 such that the following condition holds. 



Condition C. 
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P{\ej\<a)>C, Vj,7i, (C.l) 

|/f^(t)|>^^.jT), y\t\<T, (C.2) 

^^Jt)<\fe]{t)\<^,,nit), yt>T, (C.3) 

l/5'WI<^,,n(i), Vt>T, (C.4) 

ip^Jt)>c,-TpjJc2t), yt>0, (C.5) 



for some T >0, ci > and C2 > 1 which are independent of j and n. Note that condi- 
tion (C.l) prevents fe from spreading too intensively, while the other conditions rep- 
resent a weak version of monotonicity for l/^*]. In particular, the so-called ordinary 
smooth densities fu-, in the terminology of Fan (1991a, h), satisfy tp^^(t) ~ Ci\t\^'^ 

and Tpj ^(t) = C2\t\~^ with C2 > Ci > and v > 0, and the supersmooth densities satisfy 
if. {t)=Ci\t\P'e^Y>{-c\t\'<) andTp,^{t)=C2\t\P^e^p{-c\ty') with C2>Ci>0, c> 0, 
7 > and P2 > Pi > 0. 

Under these conditions, we are ready to establish the rates of convergence of our 
estimator; the foUowing theorem shows that, if the bandwidth is chosen appropriately, 
then our estimator achieves optimal rates. 

Theorem 2.1. Under Conditions A-C, assume the existence of a sequence m„ | 00 such 
that, for some C2 > Ci > 0, /3 > 1/2, 

71 

C,mi+'P < J2 l^ni^n)]' < C2mi+^P (2.5) 

holds for all n. Then, 

(a) when selecting ujn ^C2^mn (with C2 defined in (C.5)), the estimator (2.2) fulfills 

sup MISE„(/jf) =0(m;^2;9). 

(b) for an arbitrary estimator based on Yi,...,Yn and C in (2.4) large enough, we 
have 

sup MISE„(/jf ) > const. ■ m'^^. 

A more precise asymptotic description of the MISE, which we denote by AMISE, can 
be obtained under additional assumptions, by using a Taylor expansion of the bias term. 
Such an asymptotic expression is useful for deriving a data-driven bandwidth (see Section 
5.1). Assume the following. 
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Condition D. 

w„ — 5- oo and n/cun —^oo as oo, (D.l) 
K is such that J \y'^K{y) \ dy < oo and is of order k, (D.2) 

fx is k + 1 times differentiable, sup^^Q ||/]^'^||oo < oo and fP e L2{R), (D.3) 

where a kth-order kernel is a kernel that satisfies iJ^K.j = J x^K{x)dx = l{j=o} for j = 
0, . . . , /c — 1 and fiK.k ~ c, with c 7^ some finite constant. 

The AMISE is described in tlie next lemma, where we use the standard notation 
h = ui^^ for tlie bandwidtli in order to highhght the usual bias- variance trade-off. 

Lemma 2.2. Under Conditions A and D, the estimator (2.2) satisfies MISE„(/x) = 
AMISE„(/x) - Rn + oQi^^), where 

AMISE„(/x) = ^0^ I ifx\^)f^^ + ^ I \K'\t)\'(j2^\f!l{t/h)\A dt (2.6) 

and i?„ = (27t)-v(E;=i Kitrr'iELi \fii{trmmK'\t/oj,,)\^dt. 

It can be shown that under mild conditions (e.g.. Condition C), the term R„ is negli- 
gible compared to the AMISE. 

3. A few interesting results in limiting cases 

This section is dedicated to studying a few interesting results obtained when considering 
limiting cases of model (2.1). We consider two extreme and opposite situations - error 
scales tending to zero or tending to infinity - and see how well the estimator behaves in 
these cases. 

3.1. Averaging replicated observations 

Context. Consider the rather frequent situation where the errors are homoscedastic and, 
for some individuals, replicated observations are available. The observations are of the 
form 

Yj,k=Xj+ej.k, je{l,...,n},/ce{l,...,r,,„}, (3.1) 

where Ej^k ^ fe- When such data are available, it is rather common to work with the 
averaged observations Y j = Y^k=i Yj,k- Indeed, although, in (asymptotic) theory, us- 
ing the averaged sample is not always advantageous - in some cases (ordinary smooth), 
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the averaged errors become smoother and thus imply a slower rate of convergence ~ in 
finite samples, the variance reduction induced by the averaging process can lead to signif- 
icant improvement of performance of the estimator; see Delaigle (2008). In this context, 
we apply our estimator (2.2) to the sample Yj = Xj +ej^ where, since rj^n may differ 
among individuals, the errors Sj := ''j'^ X]fc='i ^i.fe heteroscedastic. Below, we denote 
the density of ej by f^. . 

The normal case. In many real data applications, it is reasonable to assume that the 
error is normally distributed, that is, = A^(/i,(T^) and f^. = A^(/i,cr|„) with ct|„ = 
cr'^/rj,n and (i) = fe^{t/rj^n)- First, we show that in this case, there is no loss of 
information when using the averaged sample to estimate fx- 

Theorem 3.1. Suppose — N{^,a'^) in the model (3.1). Then, the sample Yi, . . . ,Yn 
is sufficient for fx ■ 

It is clear that each f^^ satisfies Condition C; Conditions A, B and (2.5) hold by 
appropriate selection of K and LOn- Hence, Theorem 2.1 ensures rate optimality of our 
estimator (2.2) applied to the averaged data. It is not hard to prove that for rj.„ fixed, 
the convergence rates of /„ (when using the sample of averages, rather than the original 
sample) remain unchanged, but the constants improve (hence, the estimator behaves 
better with averaged data). 

To gain more intuition about the amount of improvement one can get when using 
averaged data, consider the rather extreme situation where, as the sample size increases, 
more and more replicated data become available. The result below then shows that the 
usual logarithmic rates of convergence of the normal case can even become algebraic (see 
also Hesse (1996) for a related problem in the partial contamination context). 

Theorem 3.2. Under the conditions of Theorem 3.1, one is able to obtain algebraic 
rates for the supremum of the MISE taken over fx € ^[3,C, P > 1/2, if and only if there 
are some a > 0, 7 > 0, c > 0, S > such that 

#>/n,7.Q >c-n^ for all n, (3.2) 

where we define Jn.-y,a '■= G {l, ■ • ■ j : crj „ < 7 ■ Inn}. 

For example, we easily verify (3.2) in the case rj^n ~ j°'^n°''^ with ai, a2 > and ai + 
a2> 0. Quite surprisingly, we notice the occurrence of algebraic rates in that case without 
the need for the total number of original data N = ^j,n to increase exponentially 

fast with increases of n. Here, N increases only at a polynomial rate with n. 

3.2. A case of unbounded scaling parameters 

Whereas Theorem 3.2 focused on the behavior of our estimator in an extreme case where 
the error scale tends to zero, we now consider an opposite extreme situation where the 
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scaling parameters are unbounded. We study this problem in the particular case where 
the /e^. are symmetric and have the Fourier transform f^^.{t) = exp(— (Tj„|t|'''/2) with 
7 > 1 and some unbounded scaling parameters (Tj.„ > 0. Examples of such densities are 
Cauchy densities for 7=1 and centered normal densities for 7 = 2, where (Tj.„ are scaling 
parameters. In this case, (C.l) is not satisfied and Theorem 2.1 cannot be applied. The 
next theorem shows the somewhat surprising result that if the unbounded sequence 
{<Jj.n)j,n does not converge too rapidly to infinity, then the estimator remains consistent. 

Theorem 3.3. (a) With a suitable choice ofujn and K so that is compactly supported 
and Condition A is satisfied, estimator (2.2) is consistent for f x without any smoothness 
assumptions on fx if, for any a; > 0, we have 

n 

^exp(-crj„w^) "^00. (3.3) 

(b) // (3.3) is not valid, then there is no consistent estimator for fx G J-fi.c with 
arbitrary (3 > 1/2 and C large enough. 

This theorem also shows that the estimator (2.2) achieves consistency whenever con- 
sistent estimation is theoretically possible, for (3> 1/2 and C large enough. Examples of 
unbounded sequences that satisfy equation (3.3) are (tJ„ < o„ • \ogn and (tJ„ < Oj ■ log j, 
where o„ is an arbitrary sequence tending to zero. 

4. The case of unknown error densities 

Most papers dealing with deconvolution problems assume that the error densities are per- 
fectly known as, otherwise, the target density is not identifiable in the standard models. 
However, since the error density is unknown in many practical situations, this classical 
condition is relaxed in some recent papers. As a payback, those models require either 
the availability of additional direct data from the error distribution (Diggle and Hall 
(1993), Neumann (1997)) or rephcated measurements (Horowitz and Markatou (1996), 
Schennach (2004), Delaigle, Hall and Miiller (2007), Delaigle, HaU and Meister (2008)) 
or more restrictive conditions on the target density (Butucea and Matias (2005), Meister 
(2006, 2007)). 

In the heteroscedastic framework, the replicated measurement approach is of particu- 
lar practical importance. In the context of Section 3.1, for example, that is, replicated 
measurement under normal contamination, and where the mean /i = 0, but the vari- 
ance fj^ is unknown, cr^ is estimable by ct^ = {2N)~^ f.^ k-,)es^^iM — ^Jm)'^^ where 
iS = {{j,ki,k2) such that I < j < n, 1 < fci < /c2 < ''j,n} and N = #5. The estimated 
variance cr^ may replace cr^ in the estimator (2.2) and it can be shown that this does not 
alter the convergence rates of Theorem 2.1 for sufficiently smooth fx. This parametric 
procedure of error estimation is fairly standard in homoscedastic deconvolution because 
the possibility of obtaining replicated measurements is usually quite realistic; see, for 
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example, Carroll, Eltinge and Ruppert (1993), Stefanski and Bay (1996), Carroll et al. 
(2004) and the references therein. 

More surprisingly, in the most general case of our much less standard setting, where 
all error distributions are allowed to be different and no parametric shape is assumed 
for their densities, we are still able to use those replicates to consistently estimate the 
density fx under certain smoothness conditions. Indeed, if each observation is replicated 
at least once, fx can be consistently estimated by fn,2 introduced in Section 2, where 
$„(i) is replaced by the nonparametric estimator 



J2 E ^MmM-YjM)} 



with p > a ridge parameter introduced to avoid division by zero and S as above. 
For symmetric error densities with non-vanishing Fourier transforms and appropriate 
selection of h and p, consistency remains valid, even if the replicates of the same Xj 
have different error distributions. We note, however, that in this very general case, the 
convergence rates of Theorem 2.1 cannot be maintained when the errors are ordinary 
smooth. 

In the homoscedastic case, if the error density is known up to a scaling parameter, it 
is sometimes possible to estimate both that parameter and the target density fx with- 
out replicates. However, this can only be done by imposing more restrictive conditions 
on fx since a specific lower bound on f^ has to be assumed (see Butucea and Matias 
(2005) or Meister (2006)). Under some circumstances, such methods can be extended 
to the heteroscedastic problem. For example, suppose we can assume that fx is sym- 
metric and satisfies |/f(t)| > c/(l + \tf+^/^) for aU t e R and some known /3 > and 
c > 0, and that each error £j is N{0,a'j), where = a(l + j/n), that is, the error vari- 
ances follow a linear model with an unknown parameter a, say, in [1,2]. Note that 
(p{a,t) = n~^^"^^exp{—ajP /2)f^{t) is n~^-consistently estimable by the maximum 
of zero and the real part of the empirical characteristic function of the data for any 
t. Define known upper and lower boimds on </j(a, t) by ^(a, t) ~ cxp(— (T|t'^/2) 

and ip{a, t) = Z]j=i exp(— tT|t^/2)c/(l -|- \t\^^^/'^), respectively. We notice that for any 
a > a', we have Tp{a,t) < ip{a' ,t) for t sufficiently large. Introducing an equidistant parti- 
tion of the interval [1, 2], where = 1 + j /m, j = 1, . . . , m, are the grid points, we fix t 
large enough so that Tp{aj-i,t) > ip{aj-i,t) > Tp[aj,t) > ip(aj,t) > ^(a^+i, i) > ip{aj+i,t). 
If, for some j, the empirically accessible function ip[a,t) lies between Tp(aj,t) and 
^(aj+i,i) we have aS [aj_i,aj+i] as ip{a,t) decreases monotonically in a. Then, by 
setting TO — > oo at an appropriate order in n, we are able to estimate a; we may then 
insert its empirical counterpart a into the estimator (2.2). Although those identification 
methods are very interesting, the framework of the current paper does not allow a more 
comprehensive study of this problem. However, we have learned that it is sometimes 
possible to extend the basic ideas of Butucea and Matias (2005) and Meister (2006) to 
the heteroscedastic setting. 
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5. Finite-sample performance 
5.1. Data-driven bandwidth selection 

We define the optimal bandwidth as the one that minimizes the MISE and esti- 
mate this bandwidth by a plug-in method similar to Delaigle and Gijbels (2004). We 
follow along the lines of their two-stage procedure and only explain the differences 
with their estimator, for a fcth-order kernel. We select the bandwidth that mini- 
mizes the estimator of the AMISE in (2.6), obtained by replacing the unknown quan- 
tity Ok = Jifx^V by Ok = /{/J''^}^, where, for r any positive integer, fj-''\x) = 
(27t)^^ J {—ity exp{—itx)K^^{thr)'i'n{t) dt. Here, for all r, /i^ > is a bandwidth parame- 
ter; in particular, hk needs to be chosen to ensure consistency of the estimator of fx- We 
choose hr that minimizes the asymptotic mean squared error (AMSE) of the estimator Or- 
As in the homoscedastic case, the AMSE can be decomposed as the sum of a squared bias 
term and a variance term, where, under sufficient conditions (see Delaigle and Meister 
(2007)), the latter is negligible; hr can thus be chosen on the basis of the sole asymptotic 
bias, given by 



The procedure of Delaigle and Gijbels (2004) involves estimation of 92k by an estimator 
^2fc = (4fc)!/((2CTx)'*'°'^^(2fc)!7t^/^), obtained by assuming that fx is a normal density. 
Here, ax is an estimator of the standard deviation of X, which, in our context, can be, for 



example, dj, = [n'^ " i^-' Eti Yi)'] - [n~' Eti E(£?) - {n'' Eti ^i^m 



5.2. Simulation results 

We applied our estimator (2.2) to simulated examples from two densities fx- (1) 
X - 0.5iV(-3, 1) + 0.5N{2, 1) and (2) 0.75iV(0, 1) + 0.25A^(1.5, 1/81) . We considered four 
heteroscedastic models: (i) ei, . . . , e„/2 ~ N{0, af) and e„/2+i, • • • , £n ^Laplace(cr2); (ii) 
£i, . . . ,e„/2 ^ Af(0, af) and e„/2-i-i, ...,£« = 0; (iii) one error density fe ~ N{0, af), but a 
different number of replicated observations - here, we use the averaged data as in Section 
3.1; and (iv) £i ^ N{Q, (t|(1 -|- i/n))- These are non-trivial situations because the target 
densities fx are not easy to estimate and normal errors are hard to deconvolve. 

For density (1) (resp., density (2)), we took cri and (T2 such that Var(ei) = 25% 
(resp., 10%) X Var(X) and cr| = 10% (resp., 5%) x Var(Ar). In each case, we gener- 
ated 500 contaminated samples of size n = 50, 100 or 250 from the distribution of den- 
sity (1) or (2). For each sample, we constructed the estimator (2.2) using the plug- 
in bandwidth of Section 5.1 and the kernel K2- To evaluate performance, we calcu- 



ABias[?^] = (- 1)^^/2 



-J7-fJ-K,kOr+k/2 




(5.1) 
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Figure 1. Estimators of density (1) from samples of size n = 100 (first row) and n — 250 (second 
row), generated from model (i) (left panel), (ii) (center panel) and (iv) (right panel). 



lated, on a grid of 81 equidistant values of x, the quantiles qp{x) of the 500 esti- 
mates fn{x) for p = 0.1, 0.25, 0.5, 0.75 and 0.9. In the graphs, we refer to qo.5 as 
the median, go. 25 a-nd go. 75 as the quartiles and go.ij 9o.9 a-s the deciles. We only 
present partial results, but our conclusions were also supported by the unreported 
cases. 

In Figure 1, we show some quantile curves constructed from samples of size n = 100 
and 250, generated from density (1) under models (i), (ii) and (iv). As expected by the 
theory, these graphs show a clear improvement of the results from (i) to (ii) and when 
the sample size increases. We also see that our method does not have particular problems 
in dealing with the case of individual errors. 

In Figure 2, we compare the results for density (2) and samples of size n = 100 or 250 
coming from models (i), (ii) and (iii), where 25% of the observations are not replicated 
and 50% (resp., 25%) of the observations are replicated twice (resp., ten times). Here, 
again, we see an improvement of the quality of the estimator from model (i) to model 
(ii) and the estimator handles the case of a different number of replicated measurements 
without any particular difficulty. 

Additional results not reported here (see Delaigle and Meister (2007)) showed that 
the data-driven bandwidth procedure suffers from only a small loss of performance 
compared to the optimal bandwidth. In addition, although, asymptotically, the es- 
timator that discards the observations contaminated by the smoothest errors has 
the same behavior as the estimator that uses all the observations, the latter had 
better practical properties, especially for the smallest sample sizes. Finally, our 
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Figure 2. Estimators of density (2) from samples of size n = 100 (first row) and n — 250 (second 
row), in the case of normal and Laplace errors (first column), partially normally contaminated 
(second column) and replicated observations with normal errors (third column). 



method worked considerably better than the one that ignores the errors in the 
data. 



6. Proofs 



Proof of Theorem 2.1. Part (a) follows from (C.2), (C.3), (C.5) and (2.5) appUed to 
the fact that the MISE of the estimator is bounded by the sum of the first two terms of 
(2.3), which, in turn, is bounded by 



H -1 



sup MISE„(/x) = / " Y.\fe]it)\' 



di,cj„ 



-2/3 



(6.1) 



Concerning part (b), we note that Fan (1991a, b, 1993) derives theoretical lower bounds 
for standard density deconvolution under Holder conditions; those results can be extended 
to Sobolev classes (see Neumann (1997)). Since we are considering a problem with non- 
identically distributed data, a new concept is required. 
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Let fo{x) = 7t~^(l + x^)^^ be the Cauchy density and set fi{x) = (1 — cosx)/(7ta;^) 
with fi{t) = (1 — |t|) • l[_i.i](<). We introduce the densities 

2[m„J 

with 9j e {0, 1}. For C and n large enough, all fg's arc contained in Tp^c- Similarly to 
Fan (1993), we randomize the vector so that the 6'j's are i.i.d. with P{9j = 0) = 1/2 and 
define Oj^ = (0[m„j i • ■ • i 0, ^j+i, ■ • ■ , ^2[m„j ) ^^nd Oj^i accordingly. An application of 
Parseval's identity, combined with the fact that the f^{- — 2j)'s, j integer, have disjoint 
supports, shows that after calculating the expectation with respect to Oj , we obtain, for 
any estimator /„, that 

EeE/J|7„-M|i^(j,)>(27r)-i EeE/, / |/^*(t) - (i)p dt 

2lm,il .2j+l 

>const. Y / \fl.ii)'.fl^it)?^i (6-2) 

2Lm„J 

> const. Y, > const. • m~'^'^ 

if, for any \j\ € [[m„J,2[m„J] and any 9i G {0,1} with Zt^j, we have 

/ • • • / min Jl hk-e,,oiyk), J| hk;ej,Ayk) j dyi • • • dy„ > const. > 0, 
\fe=i fc=i / 



(6.3) 



with the densities h^.g, , ~ fg fck ■ ^-Pplyi^g LcCam's inequality (see, e.g., Devroye 
(1987), page 7) and the logarithmic function to both sides of (6.3), we sec that (6.3) is 
satisfied if 



- aj\fc,n)/aj,fe,« = 0(1) (6.4) 

k=l 

holds for all | j | G [ [wnj , 2 [m„J ] , where we write 

aj,k,n ■■= J [(/0,,o */eJ(a;)(/e,,i * fej{x)]^^^ dx. 
Due to /e^ . > (l/2)/o, we see that aj^k,n > 1/2 and, hence, (6.4) follows from 

n 

Ex'(^M„o,/iM.a) = 0(l), (6.5) 



fc=l 
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where X^{f,g) J{f — g)'^/fdx denotes the x^-distance of densities. This generalizes the 
condition in Fan (1991a, b), X^e^iiej.o' ^iie^.i) = C){l/n), to the case of heteroscedastic 
contamination. Wc notice that the left-hand side of (6.5) is bounded above by 



k=l 



Unlike in the situation of i.i.d. data, the denominator in (6.6) still depends on k and n. 
Condition (C.l) annuls this difficulty as we have 

[/o*/eJ(a:)>7r-i / [l + (.T-y)2]-i/e,(y)dy 



J\y\<a 



21-1 



> const. • [1 + 

Therefore, applying the Fourier representation of the Sobolev norm, term (6.6) is bounded 
above by 

0{m-^^-')Y^ / (l/f (t- 2.y)/f*(t)p + l/f (t- 2j)/i* (t)p + \fl\t - 2j)fll' {t)\') dt 

n 

<0(m-2^-i)^|^,^„(m„)P, 
fc=i 

due to (C.3) and (C.4). Finally, (2.5) implies (6.5), which proves the theorem. □ 
Proof of Theorem 3.1. We introduce the orthonormal rj^n x rj^n matrices Aj^n which 

— 1/2 

consist of Tj^' • (1,...,1) as their first row. Setting Wj^, := Aj^nYj^, with Yj^, := 

{Yj^i, . . . ,Yjrj „Y , we notice that Wj^i =r^^^'^Yri, while the other components of 
are measurable in the cr-algebra generated by Ej^, . . . , Sj^rj „ since any row of Aj^n except 
the first one sums to zero, due to the orthonormal structure of Aj^n- Concerning the 
density /y^ . of Y,,,, we derive 

(%,.) = (^) ' " / /^(^) exp(-||y,,. - (x + /x) • (1, . . . , 1)*|| V(2a^)) da: 

^ ^ ^/x(x)cxp(-P,,„y,-.-r]/„'(a; + A^)-(l,0,...,0)*|lV(2a2))dz 



27tcr 



1 \'''" / 1 
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X J /x(x)cxp(-h,,i-r]/f(x + /.)|V(2a2))dx, 

where |j • |j denotes the Euchdean norm and Wj^, = Aj^ntJj,*- Therefore, we see that the 
conditional distribution of Yj,, given Wj^i and, hence, the distribution of all available 
data, 

n 

dP{Y,.. = y,,, I W,.i) = n dP{Y,,, = y,, . | W,,i), 
do not depend on fx ■ Thus we have shown sufficiency and the proof is complete. □ 



Proof of Theorem 3.2. First, we assume condition (3.2) and take the sine kernel Ki. 
In view of (3.2), for an arbitrarily small 7' G (0,7), we can choose a' E (0,a) sufficiently 
small so that liminf„^oo i^Jn,^' .a' /i^Jn,'y,a > 1- This shows that, in (3.2), we can choose 
7 = 7' and a = a' with a'/2 + 7' - 5 < 0. Setting a;„ = n^/i^f>+^1 for 5 < (/3 + l/2)a' and 
ujn = 1"^ otherwise, we learn from (6.1) that the bias term converges at algebraic rates. 
The variance has the upper bound 



0(w„) • ^exp(-CT|„tj^) 




< 0{ijJnn<'-^) < 0(7i"'/2+t'-'5). 

Hence, the algebraic decay of the MISE has been established. 

For the reverse implication, assume that the supremum of the MISE (and thus the bias 
and the variance terms in (6.1)) converges with an algebraic rate. The bias term then 
implies that ujn^c-rv' with s > 0, while the variance term is bounded below by 

const. • y • |^^exp(-a2„n274)^ 

= const. -n'-l exp(-CT|„n274) + ^ eyL^i^-al^^n^" / A)\ 

> const. • n" ■ (#J„,4,2s + n^^ ■ #J^\a,2s)^^ > const. • n" ■ (#J„,4,2s + 1)"^ 
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We deduce the existence of a 5 > and a c> such that # Jri,4,2s >c-n^. □ 

Proof of Theorem 3.3. (a) From (3.3), we can construct a sequence {ujn)n oo such 
that 

w„ exp(-crj,„'^^l) j 

for any known parameters (Jj,„. It foUows that the variance term of estimator (2.2) 
converges to due to Lemma 2.1 - as does the bias term as a;„ — > oo. 

(b) We assume that (3.3) does not hold. Then, there exist cjq > and M > such that 

n 

^exp(-aJ,„^o^)<Af (6.7) 

for infinitely many n. In the sequel, we restrict our consideration to those n. We may 
assume uiq to be arbitrarily large without affecting the validity of (6.7) and we also note 
that only a bounded number of the „'s can be less than 1. Hence, in the view of the 
asymptotic behavior, we may assume that aj^n > Ij without loss of generality. For any 
LOi > ujq, we have 

n 

J2 exp(-a2„c.7/4) < Mexp(c.o^ ~ cjJ/A). (6.8) 
i=i 

We introduce the density / with Fourier transform /^*(t) = (1 — |t/(2(jji)|) • l[-2wi.2wi] (0 
and the density whose Fourier transform is supported on [— 3wi, 3a;i]_ and coincides 
with /^*(t) on its restriction to [—uji,uji]. On [u;i,3tJi], the even function /^'(t) is defined 
as the linear connection of the points (3a;i,0) and (tJi, /^'(tJi)). The existence of / is 
guaranteed by Polya's criterion (see Lukacs (1970), page 83, Theorem 4.3.1). We notice 
that /, / e ^0,c for any (3 > 1/2, with C sufficiently large. The Parseval identity gives us 

||/-./|lL(E)>'^i/(487r). 

Equipped with those results, we fix an arbitrary estimator /„ of fx and consider 

E/ll/n — /||i2(R) + E^~||/„ — /||i2(R) 

> E/||/„, - /III (jj) +E/||/„ - /III (jj) 

(6.9) 

- |E/||/„ - /lli2(R) - E^t||/„ - /^^(jjjl 

> c.i/(487r) -o(^m- f) * fe, IIl,(r)^ . 

Therefore, we can establish inconsistency by showing that (6.9) is bounded away from 
zero for a fixed choice of ui > 0. To this end, we need an upper bound for each ||(/ — 
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/) * fcj IIli(ir)- Employing the Cauchy density fo{x) = [7t(l + x^)] ^, we use the Cauchy- 
Schwarz inequaUty to obtain 

\\if-f)*fe,\\L,w<(nJ |[(/-/)*/,J(x)|2(l + x2)dx) ^ . (6.10) 

As in the proof of Theorem 2.1, the Fourier representation of the Sobolev norm leads to 
the following upper bound for the right-hand side of (6.10): 

- f'{t))fii {t)f + i(/f"(i) - p'{t))fii it)\' + \{f\t) - r\t))f;/{t)f dt ' 

Therefore, we see that (6.9) has the lower bound 

a;i/(487r) -o|^f]exp(-a2„^7/4)j (6.11) 

when selecting uii sufficiently large. We apply (6.8) so that for appropriate constants 
ci,C2 > 0, (6.11) is bounded below by ciwi — C2 exp(— ti;7/4). Choosing uji > large 
enough, while uiq is fixed, guarantees a positive lower bound for (6.9) and, hence, in- 
consistency. □ 
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